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1  Project  Summary 


Objective:  This  project  extends  statistical  model  checking  methods  to  enable  reasoning  about 
nondeterministic  systems  and  extremely  rare  events. 

Approach:  To  address  nondeterministic  systems,  we  investigated  a  theoretical  framework  inte¬ 
grating  semi-exhaustive  simulation  with  hierarchical  abstraction  of  models.  For  rare  events, 
we  investigated  importance  sampling  methods  that  rely  on  variance  minimization  and  cross¬ 
entropy  methods  to  optimize  biasing  distributions,  allowing  statistical  methods  to  reason 
accurately  about  low-probability  events. 

Outcome/Impact:  Statistical  model  checking  methods  scale  better  than  traditional  analytic 
methods  for  very  large  systems;  this  research  is  critical  to  allow  statistical  methods  to  rea¬ 
son  about  realistic  systems  involving  nondeterminism  and  low-probability  events.  Our  new 
methods  will  be  implemented  in  the  PRISMATIC  tool,  supporting  testing,  evaluation,  and 
eventually  application  to  verification  of  real-world  systems. 


2  Statistical  Model  Checking  for  Markov  Decision  Processes 

We  have  been  investigating  the  use  of  techniques  for  model-checking  systems  described  as  proba¬ 
bilistic  automata  that  have  both  statistical  elements  and  pure  (unquantified)  nondeterminism. 

Typically,  we  model-check  safety  assertions  in  bounded-time  LTL  (BLTL),  such  as  T)<io(</>)  <  P 
-  the  probability  that  (j)  will  occur  in  10  time  steps  or  less  is  less  than  P,  where  cj)  would  be  some 
undesirable  property  such  as  system  failure,  deadlock,  etc.  The  problem  of  model-checking  such 
systems  is  equivalent  to  solving  a  Markov  Decision  Problem  (MDP),  where  we  must  show  that  no 
matter  how  an  adversary  resolves  the  non-deterministic  choices,  the  probability  of  4>  must  be  less 
than  P;  i.e.,  that  the  system  description  forces  a  win  against  a  (nondeterministic)  adversary  who 
is  trying  to  make  4>  occur  with  a  probability  of  at  least  P. 

These  model-checking  problems  are  extremely  computationally  demanding,  both  in  terms  of 
time  and  space.  CMU  researchers  have  addressed  this  problem  with  their  development  of  Statistical 
Model  Checking  for  Markov  Decision  Processes  (SMCMDP)  [3].  The  SMCMDP  algorithm  operates 
in  two  phases:  the  first  phase  uses  learning  to  find  the  worst-case  policy  for  the  nondetermistic 
adversary,  and  the  second  phase  uses  that  learned  policy  to  simplify  the  problem  and  solve  for  a 
model-checking  conclusion.  The  first  phase  resolves  the  nondeterminism  with  an  initial  probability 
distribution,  and  then  uses  multiple  rounds  of  Monte  Carlo  sampling  and  Reinforcement  Learning  to 
improve  the  policy  for  nondeterminic  choices  with  respect  to  satisfying  a  Bounded  Linear  Temporal 
Logic  (BLTL)  property.  The  second  phase  uses  the  best  learned  policy  to  reduce  an  MDP  to  a  fully 
probabilistic  Markov  chain,  on  which  known  statistical  model  checking  methods  may  be  applied  to 
give  an  approximate  solution  to  the  problem  of  checking  the  probabilistic  BLTL  property. 

In  the  last  year,  we  have  investigated  AO*  search  and  Monte  Carlo  Tree  Search  algorithms  to 
complement  and  enhance  CMU’s  SMCMDP. 
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2.1  Challenges  in  Statistical  Model  Checking  for  Markov  Decision  Processes 

CMU’s  SMCMDP  implementation  can  substantially  ease  the  runtime  requirements  of  our  model¬ 
checking  problems.  It  easily  adapts  to  parallel  and  multi-core  systems,  providing  further  speedups. 
Nevertheless,  challenges  remain,  in  particular: 

•  While  the  SMCMDP  technique  soundly  demonstrates  property  violations  (where  the  proba¬ 
bility  of  (f>  exceeds  the  desired  value),  it  cannot  accurately  identify  cases  where  the  property 
is  necessarily  satisfied. 

•  In  order  to  use  efficient  MDP-solving  techniques,  SMCMDP  can  derive  only  memoryless, 
stationary  policies  for  the  adversary.  This  can  compromise  identification  of  violations,  in 
cases  where  time  considerations  are  necessary  to  force  the  system  to  violate  the  property  with 
the  required  probability.  In  other  words,  SMCMDP  only  reasons  about  a  limited  nrenroryless 
form  of  adversary,  but  there  can  be  situations  where  stateful  adversaries  are  more  hazardous, 
so  the  SMCMDP  assurances  are  not  correct,  in  general. 

•  When  there  are  extreme  probabilities  in  the  models,  sampling  in  SMCMDP  converges  slowly. 

2.2  AO*  Search 

We  have  been  developing  methods  to  check  bounded-time  properties  of  probabilistic  automata 
using  heuristic  search.  The  strengths  (and  weaknesses)  of  heuristic  search  nicely  complement  those 
of  sampling  methods  and  dynamic  programming  (as  in  PRISM).  In  particular,  when  the  heuristic 
performs  well,  we  can  avoid  enumerating  the  full  state  space.  Like  dynamic  programming,  but 
unlike  statistical  methods,  AO*  search  guarantees  the  correctness  of  the  probability  bounds  it 
computes.  This  is  particularly  important  when  we  fail  to  find  a  counterexample  for  a  claim:  if 
we  report  that  a  system  is  safe  because  it  cannot  reach  a  safety-violating  state,  s,  with  greater 
than  a  probability  P,  we  can  make  this  claim  with  confidence.  Since  the  sampling  methods  do  not 
currently  provide  bounds  on  the  quality  of  the  policy  (counterexample)  they  compute,  we  cannot 
currently  make  such  safety  claims  with  confidence.1 

Search  algorithm:  We  have  implemented  a  version  of  AO*  search  as  an  extension  to  the  PRISM 
probabilistic  model  checker.  AO*  search  explores  an  AND/OR  tree,  so  we  can  use  it  to  find  the 
probability  of  reachability  for  a  property  in  PRISM’s  Probabilistic  LTL.  By  finding  the  maximum 
probability  of  reachability,  we  can  check  properties  of  the  form  “what  is  the  maximum  probability 
of  reaching  a  state  that  satisfies  4>  in  less  than  k  steps?”  The  problem  is  an  AND/OR  search,  rather 

xWe  can  be  probabilistically  certain  that  we  cannot  reach  s  with  a  probability  greater  than  P,  based  on  the 
adversary  policy  chosen  by  sampling,  but  we  cannot  currently  provide  informative  guarantees  that  the  adversary 
policy  is  optimal,  or  close  enough  to  optimal  to  justify  the  safety  claim. 
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than  a  simple  graph  reachability,  because  we  must  compute  the  best  choice  for  reachability  (OR) 
for  all  of  the  possible  outcomes  of  the  probabilistic  branches  (AND). 

Our  initial  implementation  was  based  on  the  text  by  Edelkamp  and  Schrodl  [2].  We  were 
hampered  by  a  substantial  error  in  the  book’s  presentation  of  the  algorithm.  We  have  reported  that 
error  to  the  authors;  an  erratum  will  be  prepared.  In  addition,  we  made  substantial  modifications 
to  the  algorithm  in  order  to  make  it  (1)  interface  with  the  methods  and  data  structures  of  the 
PRISM  model  checker,  (2)  incorporate  an  informed  heuristic  (see  below),  and  (3)  exploit  special 
features  of  the  MDP  search  problems. 

Heuristic:  In  order  to  get  acceptable  performance  from  AO*,  we  must  have  an  informed  and 
admissible  heuristic.  The  heuristic  must  be  admissible,  or  we  cannot  ensure  that  we  will  find  the 
optimal  probability  of  reachability.  We  have  used  heuristics  inspired  by  methods  that  have  been 
found  effective  in  AI  planning.  We  initially  experimented  with  simple  reachability,  simply  dis¬ 
tinguishing  between  states  from  which  the  goal  is  and  is  not  reachable.  This  could  be  efficiently 
computed,  using  BDDs,  but  was  not  sufficiently  informative.  We  replaced  this  “disjunctive”  heuris¬ 
tic  with  a  “metric”  heuristic  that  computes  an  (over)  estimate  of  the  probability  of  reaching  the  goal 
from  each  state.  We  can  compute  this  efficiently  by  doing  a  backwards  reachability  computation 
from  the  goal  state,  implemented  using  ADDs.  An  exact  backwards  reachability  computation  is 
the  dynamic  programming  method  used  by  PRISM,  so  for  efficient  computation,  we  must  relax 
this  computation  to  an  over-estimate  (over-  for  admissibility).  We  do  this  by  quantifying  away 
the  action  decisions,  effectively  acting  as  if  we  could  take  all  the  decision  from  an  OR  node.  In 
our  experiments  so  far,  this  heuristic  estimate  provides  a  good  compromise  between  information 
content  and  efficient  computability  (see  preliminary  example  below).  Further  abstraction  in  the 
heuristic  may  prove  necessary  as  we  experiment  with  more  models. 

Performance  of  Search  Algorithm:  As  an  example,  see  Figure  1,  which  shows  a  preliminary 
comparison  between  our  AO*  search  algorithm  and  PRISM  on  a  scaled  set  of  WLAN  examples. 
The  WLAN  examples  were  taken  from  the  PRISM  web  page  (http://www.prismmodelchecker. 
org/casestudies/wlan.php).  This  model  describes  the  handshake  and  randomized  exponential 
backoff  rules  used  for  collision  avoidance  in  the  IEEE  802.11  standard  [5].  For  more  details  on 
the  PRISM  modeling  and  verification,  see  [5].  We  scale  the  problems  by  extending  the  length 
of  the  permissible  back-off  in  the  face  of  collisions.  As  will  be  seen  in  the  PRISM  results,  this 
causes  the  state  space,  and  hence  run  time,  to  grow.  On  the  other  hand,  the  AO*  search,  with 
its  heuristic  guidance,  is  not  sensitive  to  the  growth  in  the  state  space.  We  will  examine  different 
models  to  identify  which  are  most  suitable  for  which  solution  methods  (search,  sampling,  dynamic 
programming) . 

As  we  perform  more  comparisons,  we  expect  to  find  weaknesses  in  the  AO*  implementation  and 
make  improvements.  For  example,  recent  tests  of  the  performance  and  correctness  of  the  algorithm 
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PRISM/AO*  Comparison  for  wlan  reachability 


Figure  1:  Comparison  between  AO*  algorithm  and  PRISM’s  dynamic  programming  on  scaled 
WLAN  problems. 


revealed  inefficient  data  structures  for  representing  the  best  partial  solution  in  the  search.  Improving 
these  data  structures  provided  a  substantial  speedup. 

We  are  also  investigating  whether  to  extend  our  AO*  search  to  AO*  branch- and-bound  search  [6, 
7].  Branch-and-bound  might  allow  us  to  prune  substantial  parts  of  the  search  space,  providing 
memory  savings,  particularly  when  handling  very  large  models. 

2.3  Monte  Carlo  Tree  Search 

The  Monte  Carlo  sampling  process  in  SMCMDP  can  take  a  long  time  to  converge.  This  problem  can 
manifest  itself  either  in  the  first  phase  where  reinforcement  learning  is  used  to  find  an  adversary 
policy  (resolving  non-determinism  in  the  model),  or  in  the  second  phase  when,  after  the  non¬ 
determinism  has  been  resolved,  we  sample  from  the  resulting  Markov  Chain  to  evaluate  the  BLTL 
property’s  worst  case  probability. 

To  improve  performance  in  the  first  phase  of  SMCMDP,  SIFT  has  been  experimenting  with 
Monte  Carlo  Tree  Search  (MCTS)  methods  [1].  These  methods  have  been  very  successful  in  difficult 
search  applications,  including  Computer  Go,  and  planning  under  uncertainty.  SIFT  has  developed 
two  sampling  methods  using  the  Upper  Confidence  Bounds  Applied  to  Trees  (UCT)  Monte  Carlo 
Tree  Search  algorithm  [4]:  offline  UCT  which  computes  a  policy  over  the  full  state  space,  and 
online  UCT  which  estimates  the  probability  of  the  property  against  the  optimal  adversary. 
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Model 

States 

True 

Probability 

Threshold 
Probability  (P) 

Learning 

Samples 

Correct 
w/o  UCT 

Correct 
w/  UCT 

CSMA  2  2 

1038 

0.861 

0.85 

2000 

12% 

77% 

CSMA  2  4 

7958 

0.768 

0.76 

2000 

18% 

70% 

CSMA  2  6 

66718 

0.616 

0.61 

2000 

25% 

54% 

CSMA  2  6 

66718 

0.616 

0.61 

4000 

22% 

74% 

Figure  2:  Initial  results  of  UCT-guided  SMCMDP  on  different-sized  models  of  the  proba¬ 
bilistic  CSMA  protocol.  For  each  model,  we  asked  SMCMDP  and  SMCMDP  with 
UCT-guided  sampling:  using  the  given  number  of  samples,  is  the  threshold  prob¬ 
ability  less  than  the  true  probability.  The  percentage  correct  is  out  of  100  runs. 

Offline  UCT:  Our  offline  UCT  method  simply  replaces  the  sampling  method  in  SMCMDP  with 
UCT.  We  expected  that  UCT’s  nice  property  of  balancing  exploitation  of  known  good  actions  with 
exploration  of  seldom  explored  actions  would  lead  to  finding  an  optimal  (or  near-optimal)  adversary 
policy  with  fewer  samples.  We  have  implemented  offline  UCT  as  an  extension  to  PRISMATIC. 

Our  initial  experiments  with  offline  UCT  looked  promising.  As  seen  in  Figure  2,  offline  UCT 
yielded  the  correct  answer  much  more  often  than  “vanilla”  SMCMDP,  learning  from  the  same 
number  of  samples,  for  difficult  problems  where  the  threshold  probability  is  very  close  to  the  true 
probability.  This  indicated  that  UCT  helped  SMCMDP  learn  a  better  policy  with  the  same  number 
of  samples. 


Figure  3:  How  often  offline  UCT  and  SMCMDP  learn  the  correct  policy  for  an  easier  (left) 
and  more  difficult  (right)  version  of  our  satellite  model. 


Next,  we  compared  how  many  samples  it  took  SMCMDP  and  offline  UCT  to  learn  the  optimal 
adversarial  policy  for  a  scalable  satellite  control  model  we  created.  As  seen  in  Figure  3,  offline 
UCT  converges  to  the  optimal  policy,  but  with  the  default  parameters  SMCMDP  did  not  converge 
to  the  optimal  policy.  This  model  is  very  small  (15  states,  52  transitions,  30  actions),  so  it  was 
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discouraging  to  see  how  many  traces  it  took  to  find  the  optimal  policy. 


wlanO 


Figure  4:  Probability  estimates  from  SMCMDP  on  a  WLAN  model,  the  averaged  over  100  runs. 

We  also  ran  experiments  on  some  larger  models,  where  we  do  not  have  the  (very  large)  optimal 
policy  readily  available,  but  we  do  have  the  true  underlying  probability  of  the  property.  To  evaluate 
offline  UCT  on  these  models,  we  had  it  learn  the  policy,  then  estimate  the  probability  of  the 
property  using  that  policy.  These  experiments  show  that  the  quality  of  offline  UCT’s  policy  lags 
behind  SMCMDP  with  a  small  number  of  traces,  but  does  catch  up  with  more  traces  (see  Figure  4). 

We  believe  offline  UCT’s  need  for  more  samples  than  SMCMDP  results  from  a  difference  in 
bookkeeping  between  the  two  algorithms.  When  SMCMDP  takes  a  trace,  it  remembers  the  reward 
in  every  state  along  that  trace.  When  UCT  takes  a  trace,  it  only  remembers  the  rewards  for  states 
that  have  been  expanded  in  its  search  tree.  We  will  see  if  modifying  offline  UCT  to  remember 
rewards  along  all  states  in  every  trace  will  reduce  the  number  of  traces  required  to  learn  a  good 
policy,  at  the  cost  of  using  more  memory. 

Online  UCT:  The  typical  application  of  UCT  to  playing  a  game  (like  Go)  does  not  involve 
computing  a  policy  for  the  entire  state  space,  as  our  offline  UCT  algorithm  attempts.  Rather,  it 
runs  online — that  is,  it  only  takes  one  action  at  a  time  (a  move),  then  senses  the  opponent’s  move, 
and  repeats  until  the  game  finishes.  Our  online  UCT  algorithm  follows  this  game-playing  analogy, 
with  our  moves  being  the  resolution  of  non-determinism,  the  opponent’s  moves  being  the  resulting 
probabilistic  transitions,  and  a  game  is  won  if  we  satisfy  the  property  and  lost  if  we  do  not. 

In  this  framework,  each  “game”  results  in  one  trace  through  the  system,  using  the  best-looking 
action  from  each  state  as  determined  by  UCT  sampling.  After  playing  many  games,  we  can  look 
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Figure  5:  Online  UCT  probability  estimates  after  800  games,  with  1000  samples  per  move 
on  WLAN  models.  The  models  range  from  6,063  to  5,007,666  states.  The  true 
underlying  probability  is  about  0.184. 


at  how  often  the  property  is  satisfied  to  get  an  estimate  of  the  probability  of  the  property. 

We  have  implemented  online  UCT  as  an  extension  to  PRISMATIC,  and  started  running  ex¬ 
periments.  Figure  5  shows  online  UCT’s  probability  estimates  for  the  set  of  WLAN  models  from 
PRISM’s  case  studies.  All  estimates  are  under  10%  error,  with  a  small  number  of  games  relative 
to  the  size  of  the  state  space.  Also  note  the  quality  of  the  probability  estimates  is  not  sensitive  to 
the  size  of  the  model’s  state  space,  indicating  that  online  UCT  focuses  on  the  interesting  parts  of 
the  state  space. 

Currently  the  number  of  games  to  play  is  given  as  input.  We  plan  to  use  Wald’s  Sequential 
Probability  Ratio  Test  (SPRT)  as  a  termination  criteria,  which  will  minimize  the  number  of  games 
played  to  achieve  a  given  level  of  confidence  in  our  answer  [8].  Likewise,  the  number  of  samples 
taken  prior  to  making  each  move  is  given  as  input;  we  will  investigate  optimizing  this  as  well. 

Another  opportunity  for  improvement  is  to  share  information  between  games.  The  games  online 
UCT  plays  are  all  currently  independent;  that  is,  none  of  the  sampling  results  from  one  game  are 
carried  to  future  games.  Carrying  forward  some  data  between  games  could  improve  the  results. 
We  could,  for  example,  keep  a  cache  of  frequently-encountered  states,  or  learn  an  “opening  book” 
of  good  moves  near  the  initial  state. 

2.4  Time-Dependent  Policies 

One  limitation  of  CMU’s  SMCMDP  is  that  it  produces  only  stationary,  memoryless  policies  (ad¬ 
versaries).  But  when  performing  bounded  time  model-checking,  in  general  the  optimal  adversary 
policy  is  time-dependent.  That  means  that  a  model-checker  using  a  stationary  policy  may  incor¬ 
rectly  label  some  systems  as  safe.  The  challenge  of  finding  time-dependent  policies  is  twofold:  (1) 
keeping  track  of  time  in  the  model  causes  state  space  explosion,  and  (2)  in  general,  a  separate 
policy  is  required  for  every  time  unit,  making  the  policy  very  large. 

We  have  developed  several  test  problems  for  experimenting  with  time-dependent  vs.  stationary 
adversaries.  For  the  simple  example  in  Figure  6,  each  transition  takes  one  time  tick,  and  the 
adversary  wins  by  driving  the  model  into  fail.  He  starts  at  sO  and  must  choose  action  L  or  S; 
after  that  the  transitions  are  determined  by  the  indicated  probabilities.  With  4  or  more  time  ticks 
left,  L  has  the  highest  probability  of  failure.  However,  with  3  time  ticks  left,  fail  is  unreachable 
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Figure  6:  A  simple  example  where  the  optimal  adversary  requires  a  time-dependent  policy. 

through  L,  so  the  adversary  should  choose  action  S. 

We  have  begun  exploring  methods  for  efficiently  developing  time-dependent  adversaries.  One 
possibility,  suggested  by  our  adoption  of  UCT,  is  to  compute  the  adversary’s  “moves”  on-line, 
avoiding  the  need  to  store  full  policies.  Other  techniques  we  are  considering  include  exploiting 
structured  state  models,  and  compressing  large  policies. 

3  Delta-complete  Analysis 

We  have  developed  the  framework  of  delta-complete  analysis  [9]  for  bounded  reachability  problems 
of  general  hybrid  systems.  We  perform  bounded  reachability  checking  through  solving  delta-decision 
problems  over  the  reals.  The  techniques  take  into  account  of  robustness  properties  of  the  systems 
under  numerical  perturbations.  We  prove  that  the  verification  problems  become  much  more  math¬ 
ematically  tractable  in  this  new  framework.  Our  implementation  of  the  techniques,  an  open-source 
tool  dReach,  scales  well  on  several  highly  nonlinear  hybrid  system  models  that  arise  in  biomedical 
and  robotics  applications.  We  developed  a  framework  to  give  upper  bounds  on  the  computational 
complexity  of  stability  problems  for  a  wide  range  of  nonlinear  continuous  and  hybrid  systems.  To 
do  so,  we  describe  stability  properties  of  dynamical  systems  using  first-order  formulas  over  the 
real  numbers,  and  reduce  stability  problems  to  the  delta-decision  problems  of  these  formulas.  The 
framework  allows  us  to  obtain  a  precise  characterization  of  the  complexity  of  different  notions  of 
stability  for  nonlinear  continuous  and  hybrid  systems.  We  proved  that  bounded  versions  of  the 
stability  problems  are  generally  decidable,  and  give  upper  bounds  on  their  complexity.  The  un¬ 
bounded  versions  are  generally  undecidable,  for  which  we  give  upper  bounds  on  their  degrees  of 
unsolvability. 

We  developed  a  novel  approach  for  solving  the  probabilistic  bounded  reachability  problem  of 
hybrid  systems  with  parameter  uncertainty  [10].  Standard  approaches  to  this  problem  require 
numerical  solutions  for  large  optimization  problems,  and  become  unfeasible  for  systems  involving 
nonlinear  dynamics  over  the  reals.  Our  approach  combines  randomized  sampling  of  probabilistic 
system  parameters,  SMT-based  bounded  reachability  analysis,  and  statistical  tests.  We  utilize 
delta-complete  decision  procedures  to  solve  reachability  analysis  in  a  sound  way,  i.e. ,  we  always 
decide  correctly  if,  for  a  given  combination  of  parameters,  the  system  actually  reaches  the  unsafe 

8  Final  Report,  March  14,  2016 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


region.  Compared  to  standard  simulation-based  analysis  methods,  our  approach  supports  non- 
deterministic  branching,  increases  the  coverage  of  simulation,  and  avoids  the  zero-crossing  problem. 
We  demonstrate  that  our  method  is  feasible  for  general  hybrid  systems  with  parametric  uncertainty 
by  applying  the  implemented  tool  -  SReach  -  to  various  nonlinear  hybrid  systems  with  parametric 
uncertainty. 

We  found  serious  bugs  in  floating-point  computations  for  evaluating  elementary  functions  in  the 
Embedded  GNU  C  Library  [11].  For  instance,  the  sine  function  can  return  values  larger  than  1053 
in  certain  rounding  modes.  Further  investigation  also  exposed  faulty  implementations  in  the  most 
recent  version  of  the  library,  which  seemingly  fixed  some  bugs,  but  only  by  discarding  user-specified 
rounding-mode  requirements.  We  discuss  our  experience  in  how  these  bugs  were  spotted  and  how 
they  affected  the  implementation  process  of  our  SMT  solver  dReal. 
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